Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings
نویسندگان
چکیده
Recently, due to the increasing popularity of social media, the necessity for extracting information from informal text types, such as microblog texts, has gained significant attention. In this study, we focused on the Named Entity Recognition (NER) problem on informal text types for Turkish. We utilized a semi-supervised learning approach based on neural networks. We applied a fast unsupervised method for learning continuous representations of words in vector space. We made use of these obtained word embeddings, together with language independent features that are engineered to work better on informal text types, for generating a Turkish NER system on microblog texts. We evaluated our Turkish NER system on Twitter messages and achieved better F-score performances than the published results of previously proposed NER systems on Turkish tweets. Since we did not employ any language dependent features, we believe that our method can be easily adapted to microblog texts in other morphologically rich languages.
منابع مشابه
Semi-supervised Bio-named Entity Recognition with Word-Codebook Learning
We describe a novel semi-supervised method called WordCodebook Learning (WCL), and apply it to the task of bionamed entity recognition (bioNER). Typical bioNER systems can be seen as tasks of assigning labels to words in bioliterature text. To improve supervised tagging, WCL learns a class of word-level feature embeddings to capture word semantic meanings or word label patterns from a large unl...
متن کاملMultimedia Lab $@$ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations
Due to the short and noisy nature of Twitter microposts, detecting named entities is often a cumbersome task. As part of the ACL2015 Named Entity Recognition (NER) shared task, we present a semisupervised system that detects 10 types of named entities. To that end, we leverage 400 million Twitter microposts to generate powerful word embeddings as input features and use a neural network to execu...
متن کاملsisinflab: an Ensemble of Supervised and Unsupervised Strategies for the NEEL-IT Challenge at Evalita 2016
English. This work presents the solution adopted by the sisinflab team to solve the task NEEL-IT (Named Entity rEcognition and Linking in Italian Tweets) at the Evalita 2016 challenge. The task consists in the annotation of each named entity mention in a Twitter message written in Italian, among characters, events, people, locations, organizations, products and things and the eventual linking w...
متن کاملWord Representations: A Simple and General Method for Semi-Supervised Learning
If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word repre...
متن کاملSemi-supervised sequence tagging with bidirectional language models
Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data. In this paper, we demonstrate a general semi-supervised approach for adding pret...
متن کامل